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Contextuality is central to both the foundations of quantum theory and to the novel information 
processing tasks. Although it was recognized before Bell's nonlocality, despite some recent proposals, 
it still faces a fundamental problem: how to quantify its presence ? In this work, we introduce two 
measures of contextuality. One is direct analogue of a known measure of non-locality, called the 
contextuality cost. The other can be viewed as an analogue of the relative entropy of entanglement 
and is called the relative entropy of contextuality. Based on the fundamental fact that contextual 
system can not be described by a single joint probability distribution, we introduce another measure 
of contextuality, called optimal contextual correlation factor, and prove that it equals relative entropy 
of contextuality, providing thereby operational derivation of the latter. We further show that it 
is monotonous under some of operations which preserve non-contextuality. We provide a lower 
bound on relative entropy of contextuality by computing its variant called uniform relative entropy 
of contextuality, for the boxes that possess high symmetries including Popescu-Rohrlich, Peres- 
Mermin, Mermin's and Klyachko's ones. In special cases we prove the additivity of uniform relative 
entropy of contextuality, while in others show that it is additive in two copies. We compute also the 
cost of contextuality of some boxes, and observe that it does not increase under operations which 
preserve non-contextuality. 



Introduction. Non-locality is one of the most interest- 
ing manifestations of quantumness of physical systems 
[T] . It exhibits the strength of correlations that comes out 
of a quantum state when measured independently by dis- 
tant parties that share it, which is sometimes higher than 
that coming from classical resources, and can be even 
higher for super-quantum but non-signaling resources ^ . 
Non-locality has been studied in detail in recent decades, 
and has been found as a resource for different information 
theoretical tasks [5] such as quantum device independent 
cryptography @]-[7] and quantum communication com- 
plexity [8 . It has been formulated in terms of 'boxes' i.e. 
families of probability distribution, and has been studied 
both qualitatively through the so called Bell inequalities 
as well as quantitatively through measures of non- locality 
such as cost of non- locality, distillable non -locality [21 |9l- 
[T^ or recently as its (anti)robustness [TB] , 

There is however another phenomenon known even ear- 
lier than Bell's non-locality under name of quantum con- 
textuality [H]. Namely for certain sets of observables, 
some of which may be commensurable, their results could 
not preexist prior to the measurements, or otherwise one 
would obtain logical contradiction sometimes called as 
Kochen-Specker paradox [TS]. In recent years, this phe- 
nomenon has been studied in deep. New examples of 
Kochen-Specker proofs of contextuality has been found 
[TBHT5] (see also [ini [SU] and references therein for re- 
cent results) , and the counterparts of Bell inequalities has 
been introduced, however in a state independent fashion 
[2T] i.e. that are violated by any quantum state (see also 
state dependent attempts of [HI [53] and [231 dS] for more 



recent achievements). The fact that quantum theory is 
contextual has been also treated experimentally [2BH^ . 
see also [2M5^ and references therein for recent results. 
In fact the phenomenon of non-locality is special case of 
contextuality: the commensurability relations are pro- 
vided by the fact that observables are measured on sepa- 
rate systems. Yet it is not vice versa: the phenomenon of 
contextuality is more basic, as can hold in single partite 
systems. To our knowledge, the quantitative approach to 
contextuality has not been developed so far. More specif- 
ically apart from the memory cost of contextuality |33| 
and the ratio of contextual assignments |34j . there has 
not been proposed a measure of contextuality which is 
defined also on single systems and is faithful i.e. nonzero 
iff the system is contextual. Moreover the properties of 
the already introduced measures has not been studied 
from the entanglement theory point of view. The aim 
of this work is to propose two measures of contextuality 
(and their variants), valid both in single and many party 
scenarios, and study their properties. We note here, that 
our aim is to propose measures, which quantify contextu- 
ality that is both state independent and state dependent. 

Our main object of interest is the set of observables, to- 
gether with relations of commensurability among them. 
Its formal description is a hypergraph, where vertices are 
observables and hyper-edges are sets of commensurable 
observables, called further contexts. With such a hyper- 
graph we associate a probability space. A box is then 
a family of distributions on this probability space. The 
cardinality of input of the box is number of contexts and 
cardinalities of outputs are proper multiplications of car- 
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dinalities of outputs of each observable that form given 
context. One can also use another description of the 
problem considering instead a commensurability graph 
where vertices are observables, and edges indicate that 
two observables are commensurable. Such a description 
is equivalent to a hyper-graph one, where the set of all 
vertices connected with each other, corresponds to a con- 
text in a hypergraph. We note here, that the often used 
orthogonality graph (see e.g. [111 UHl US] and references 
therein) is special case of the commensurability graph, 
hence our approach includes this approach. 

In what follows, we define two measures of contextu- 
ality. First is direct analogue of non-locality measure, 
called nonlocality cost [2]. We call it contextuality cost 
and show that it has good properties of measure, apart 
from the fact that it is not extensive measure i.e. is not 
proportional to dimension of the system. We compute it 
for some families of single-party boxes, in analogous way 
it is done in non-locality scenario. 

Our main result is the introduction of the second mea- 
sure, called relative entropy of contextuality. We base 
on the fundamental fact that in contrast with a non- 
contextual box which can be described by a single joint 
probability distribution, one needs at least two joint 
probability distributions to describe a contextual box. 
This enables us to introduce naturally a measure which 
quantifies correlations with the variable that numbers 
these contexts. We call this measure optimal contex- 
tual correlation factor. We then show that these two 
introduced measures are equal, hence providing opera- 
tional derivation of the relative entropy of contextual- 
ity. We then pass to compute its variant called uniform 
relative entropy of contextuality for some quantum (and 
superquantum) boxes, which is a lower bound to the rel- 
ative entropy of contextuality. To achieve this aim, we 
introduce operations of symmetrization of a box by ap- 
plying randomly some permutation of observables and 
bit-flips of their outputs, called twirling - a counterpart 
of depolarization operations introduced in |361 137j for the 
case of 2 parties with two binary inputs and outputs (in 
entanglement theory, using symmetries goes back to sem- 
inal Werner paper [55], and was further developed e.g. 
in [sunn]). The set of boxes invariant under twirling, 
we call isotropic. For extremal isotropic boxes such as 
Popescu-Rohrlich (PR) [5], Peres-Mermin (PM) [TBI [TT] . 
Mermin's (M) [JSj or Chain C77(„) box (see ^ and ref- 
erences therein), we prove additivity of uniform relative 
entropy of contextuality, while for isotropic boxes, we 
show that it is 2-copy additive i.e. for 2 copies it is twice 
as for single copy. We then prove monotonicity of both 
relative entropy of contextuality and its uniform version 
under operations that are mixtures of compositions of ac- 
tions of channels on each observable independently. It is 
easy to see that relative entropy of contextuality as well 
as its uniform version is faithful. 

Preliminaries. We consider a hypergraph G = 
{Vg, Eq) with Vg — {Ai, Ak} being the set of observ- 
ables (vertices) and Eq C 7^(Vg) the set of sets of mu- 



tually commensurable observables (set of contexts). We 
then consider boxes, i.e. families of probability distribu- 
tions. We say that a box is compatible with hypergraph 
G if it is a family of n probability distributions with n be- 
ing the power of the set Eg i.e. number of contexts, such 
that for each c — {A^^, A'l^^^} £ Eg, where |c| is the 
power of the context c, there is a corresponding probabil- 
ity distribution in this family on n,{Ai^) x ... x fl{Ai^^^). 
The il are the sets of outcomes of the corresponding ob- 
servables. We denote the box as a family of distributions 
{P{Aj^ , A^i^i \c)}ceEG ■ Given a hypergraph G, by con- 
sistent box we mean a box which has well defined dis- 
tributions of commensurable observables, i.e. such that 
are independent from the context in which they are (see 
formal Definition [T|. Note, that the well known non- 
signaling condition [2j is special case of such defined con- 
sistency. The set of all consistent boxes compatible with 

(n) 

hypergraph G with n contexts we denote as Cq . By 
non- contextual boxes we mean consistent boxes with a 
property, that there exists a common joint probability 
distribution for all the observables in Vg- The set of all 
such boxes compatible with G, we denote as NCg- [SS] 
All boxes that are consistent but do not satisfy this con- 
dition, we call contextual. To specify distributions that 
belong to box B G Gg'"* we will denote it as {g{Xc)} where 
c numbers the contexts running from 1 to n. To denote 
a probability of an event i of the distribution of context 
c we will write g{Xc)i or just g{i) if the context is known. 
When box B is non-contextual, we denote it as {p(Ac)}, 
and by p{X) we will denote the joint probability distribu- 
tion on Vg (which exists by definition of non-contextual 
box) of which p{\c) are appropriate marginals. 

We give now exemplary families of boxes for which we 
are able to find contextuality. To have concise notation, 
we distinguish two probability distributions; Pi™L and 
^odd random variables (^i, A„) which are proba- 
bility distributions on {0, 1}^™, such that = 
for all events such that Ai Q) A2 ® ... ® Am = and zero 
otherwise, while Pj^^J = ^^-i fo'" all events such that 
Ai (B A2 ® ... ® Am — 1 and zero otherwise where ® de- 
notes addition modulo 2. The boxes which have only two 
types of distributions of contexts: Peven and Podd we call 
xor-boxes [57j. In this paper, we will consider particu- 
lar examples of xor-boxes (see Fig. [I]), which we now 
describe: 

(n) 

CHj^n) is a box with n observables on G^^ = 

({Ai,A2,...,AJ,{{^l,A2},{A2,A3}, 

{A„_i, A„}, {yl„, Ai})) with g(Ac) = Pi^L for all but 
the last context, and for it g{\c) = P^^li (see and ref- 
erences therein). Note that CHi^^^ is a Popescu-Rohrlich 
(PR) box. 

PM is a box on GpM = 
{{Ai,...,A9},{{Ai,A2,A3},{Ai,A5,Ae}, 
{Ar, As, Ag}, {Ai,Ai, Ar}, {A^, A5, A^}, {A3, Aq, Ag}}) 

with g{Xc) = Peven foT first 5 contexts, and g{Xc) — Po'2i 
for the 6th one [T5 1 [TT ] . 



3 





(a) PR box 



(b) C//(5) box 




(c) PM box 



(d) M box 



FIG. 1: Depiction of the hypergraphs of the exemplary 
xor-boxes. Vertices denotes observables. Each sohd hne 
corresponds to a context with distribution Pi^^L and 

each dashed Une corresponds to a context with 
distribution Pj^™J , where m is the number of vertices 
that belong to the context: 2 for PR box, and Ci/(5), 3 
for PM box and 4 for M box. 



M is a box on Gi\i 
{{A, B, C, D, E, a, b, c, d, e}, {{B, e, a, D}, {D, b, c, A}, 
{A,d,e,C},{C,a,b,E},{E,c,d,B}}) with g(A,) 



Pevln for first 4 contexts, and g{Xc) — Po'dd 
one [H]. 

For the above boxes we consider their opposite ver- 
sions, where we obtain PR' , PM' and M' via exchang- 
ing Peven with Podd in definition. We then consider fam- 
ilies of isotropic boxes (or isotropic xor-boxes) by taking 
mixture of introduced boxes with their opposite versions, 
with parameter a € [0,1], e.g. the family of isotropic 
CH(^n^ boxes are the following: 



(4) 



CH^^^ = aCi/(„) + (1 - a)CH', 



(") 



Analogously we define PMa and Mq. In what follows, 
we will be interested mainly in specific ranges of a (see 
Appendix 



ID 



for explanation), namely a e [1, ^^^) for 

a G [1, |) for PMa and a e [1, |) for M^. In 
these ranges, the isotropic xor-boxes are contextual. 
A given hypergraph G yields a definition of a large 

(n) 

family of consistent boxes Cq . E.g. one can consider 
only those boxes which emerges as measurements of some 
observables on quantum state. One can narrow the latter 
family even more, by considering in Vq only observables 
that are rank-1 projectors (see e.g. [HI [201 [31] and ref- 
erences therein). Then commensurability of 2 projectors 
turns to be mutual orthogonality, and the hypergraph 



can be interpreted as orthogonality graph. This is the 
case for the Klyachko et al. result [25], where a quan- 
tum state of qutrit is found, and measurements which 
give rise to maximal violation of the so called pentagon 
inequality. Such a pair: a quantum state and the set 
of measurements defines naturally a box called further 
Klyachko (K) box. Using similar techniques to that for 
xor-boxes, we find the value of our measures for this box. 

Different approaches to define measures of contextual- 
ity - the contextuality cost. Our goal is to introduce func- 
tions which quantify the contextuality content of boxes 
that are outside of NC . Starting from direct approach 
to this problem, we identify also good properties of such 
a function, called further measure of contextuality. 

We would like to note, that there is an obvious way to 
quantify contextuality using strength of violation of some 
Kochen-Specker (KS) inequality. This approach however 
is not universal, since there are boxes that are contextual 
but do not violate this specific KS-inequality [58 . Thus 
we demand that our measure of contextuality X should 
be faithful i.e. nonzero iff the box is contextual. 

Another approach is to base on some known measures 
of non-locality and define it properly for all (also one- 
partite) boxes. This leads us to the contextuality cost, 
which we define as follows: 

C{B) := mi{p e [0,l]\B = pBc + (l - p)Bnc} (2) 

where infimum is taken over all decompositions of box B 
into mixture of some non-contextual box Bj^c ^tnd some 
contextual box Bq. This measure inherits after nonlocal- 
ity cost the property that it is not increasing under op- 
erations that preserves non- contextuality, which are the 
operations satisfying the following axioms: (i) transform 
boxes into boxes (ii) are linear (iii) preserve consistency 
(iv) transform non-contextual boxes into non-contextual 
ones. This holds for the same reason for which the anti- 
robustness of nonlocality is non-increasing under class of 
locality preserving operations as it is shown in |13j . We 
note also that this measure is by definition faithful, and 
one can easily compute it using linear programming , it 
is however not extensive i.e. is not proportional to dimen- 
sion of the system. For the families of isotropic boxes, it 
can be found analytically namely that C{PMa) = 6a — 5, 



(1) C(M„) = 5a - 4 and C(Cff? 



na 



{n — 1) (in the 

same way as it is shown in [15] that C{PRa) = 4a — 3). 

Correlational approach to contextuality. Having men- 
tioned direct approaches to define measure of contextu- 
ality, we pass to present our main result which is intro- 
duction of new measure of contextuality. We base on 
the fundamental property of any contextual box, namely 
that there is no single joint probability distribution which 
describes it. Exploring this fact enables us to define op- 
erational quantity called optimal contextual correlation 
factor of a box. 

To introduce appropriate scenario, consider a situa- 
tion in which one would like to simulate given box using 
non-contextual resources. If the box is non-contextual, 
one would need only to produce single distribution that 
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describes the box. However, if the box is contextual, 
one needs at least 2 different distributions to simulate a 
box, depending on the contexts. One can then imagine 
an index c that numbers the contexts from some set of 
contexts C . If there is the same single distribution for 
all contexts, there can not be any correlations between 
the index c and the distribution. However if the box is 
contextual, by discriminating distributions that are for 
different contexts, one can obtain correlation with the 
index c. To quantify such correlations, one needs to con- 
sider an a priori distribution of the index, giving rise to a 
random variable. To make quantification precise, we now 
introduce scenario, which involves 3 parties: Sender, Re- 
ceiver and Adversary. For a hypergraph G = {Vg,Eq) 
with Vq of cardinality n, all the parties preagree on some 
a priori fixed box B = {g{Xc)} G C'g'^ in hands of the 
Sender. The parties then follow the protocol: 

• Sender chooses a message c with probability p(c), in 
order to communicate c to the Receiver. 

• Adversary is supposed to transfer the message as 
badly as possible (he/she wants to minimize communi- 
cation S to R) by preparing a variable Ac (having distri- 
bution on the probability space of all n observables from 
Vg) compatible with box B on input c (i.e. having a dis- 
tribution on the same probability space as (/(Ac)). The 
Adversary however creates only those Ac which have dis- 
tribution on context c equal to g{\c) i-e. to distribution 
of the box B on this context. 

• The Receiver makes most general measurement on 
Ac and produces the variable C . 

The goals of each party are the following: for fixed p{c) , 
Sender and Receiver want to maximize the Shannon mu- 
tual information [33] between C and C": I{C : C"), so 
that the Receiver should measure all possible commensu- 
rable degrees of freedom of Ac- On the other hand, the 
Adversary wants to minimize the quantity over his/her 
preparation of system Ac- Given a priori distribution 
p(c), the following quantity 



^{p{c)}{B) min/(^p(c)|c)(c| (g) A), 



(3) 



we will call the contextual correlation factor given a pri- 
ori statistics {p{c)} of a box B, where we use Dirac 
notation only for convenience, meaning a classically cor- 
related system of variables Ac correlated with register 
holding value c. By optimal contextual correlation factor 
of a box B we mean the following quantity: 



sup /{p(c)}(B). 

{P(c)} 



(4) 



(Uniform) Relative entropy of contextuality. In this 
section we introduce another measure, called relative en- 
tropy of contextuality, as it is based directly on the notion 
of relative entropy distance. The measure is defined on 
any box B = {g{Xc)} e C^"^ as follows: 

Xmax(S) := sup min ^ pic)D{giXc)\\p{Xc)) (5) 



g(Ac)i 

p(Ac), 



is the rel- 



where D{g{Xc)\\p{Xc)) = Ei3(-^c)»log 
ative entropy distance between distributions g{Xc) and 
p{Xc) |43j.|59j. The minimization is taken over all dis- 
tributions p{X) over r2(^i) x ... x ^{Ak) with marginal 
distribution on context c equal to p{Xc), and supremum 
is taken over probability distributions p{c) on the set of 
numbers of contexts {1, ...,n}. 

A natural quantity is also the one which does not dis- 
tinguish the contexts: 



X^{B) :=min ^ -D{g{Xc)\\p{Xc)) 



(6) 



where n is cardinality of Eg, i.e. number of contexts, 
which we call uniform relative entropy of contextuality. 

We note here, that both measures are faithful. In- 
deed, the relative entropy is lower bounded by square 
of variational distance between the probability distribu- 
tions, which is zero only if all g{Xc) are equal to p{Xc)- 
This however cannot hold for there is no single joint prob- 
ability distribution with marginals g{Xc)- By definition 
we have X^ax ^ but in general these measures are 
not equal since they differ on direct sum of a contextual 



and non-contextual boxes (see Appendix I C ) 



Equivalence of relative entropy of contextuality and op- 
timal contextual correlation factor. We can state now 
one of our main results. Namely that optimal contextual 
correlation factor is equal to relative entropy of contex- 
tuality, i.e. 



X, 



max — ^max- 



(7) 



To prove this, we first note that 
hp(c)}iB) = 

s , ,TP. ^Tp{c)D{g{X\c)\\J2p{c)g{X\c)). 

c c 

(8) 

where g{X\c) are to be identified with Ac and the condi- 
tion g{Xc\c) = g{Xc) is imposed on Adversary. We fur- 
ther use another " intermediate" measure of contextuality 

I'maxiB) := SUp{p(^)} min{g(A|c):3(Ae|c)=g(A<,)},p(A) 

J2cP(^)B{g(X\c)\\p(X)), which we introduce to show the 
equivalence (see Theorem [I]). 

Analytical formulas of uniform relative entropy of con- 
textuality for isotropic xor-boxes. We calculate here the 
value of Xu for isotropic xor-boxes. We just give idea for 
PRa , the detailed proof for other xor-boxes is shown in 



Appendix |I D| and IE The techniques employed are anal- 
ogous to those used in entanglement theory, including 
twirling [3S] as well as using symmetries to compute mea- 
sures based on distance from the set of separable states 
[401 144] , and they were applied in the case of nonlocal- 
ity e.g. in [5S1[37]. The first step is to observe, that for 
isotropic boxes, there is 



cGEg 



Xu(B) = min 
p(A)eiB 



J2lD{giXc)\\p{Xc)) 



(9) 



5 



where infimum is taken over all probability distributions 
p(A) which give rise to an isotropic box (from set Ib), 
and p(Ac) is marginal of p{X). To show this, we consider 
G such that B G C^"\ and a group of automorphisms of 
B which can be achieved by operations that transforms 
NCg into NCg i-e. preserve non-contextuality, call it 
Gc- The idea is to apply to a box B a twirling oper- 
ation: B I-)- X^/gGc \^c\~^f{B) where \Gc\ is number 
of different automorphisms hi o tt^ which in our case are 
permutations of contexts tt^, composed with appropriate 
negations of outputs of observables hi. By the fact that 
it is automorphism, each g{\c) is transformed into itself, 
while because the operation preserve non-contextuality, 
the optimal p{\) must be within the set of twirled p{\) 
i.e. within the set of non-contextual isotropic boxes (see 
Theorem [I]) . 

Let us consider an example of PRa box (the other 
examples of isotropic xor-boxes, follow similar lines, see 
Appendix |I El) , for which 
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FIG. 2: Values of measure Xu for CHa^^ boxes for 
3 < n < 50: maximally contextual boxes (upper points, 
a = 1); maximally contextual quantum boxes (lower 
points) with (i) odd n a = (ii) even n, 

a = (1 + cos(7r/n))/2. 



Xu{PRcd = min 



p(\)=PR^, 4 ^ 



D{g{K)M\c)), (10) 



where p(A) runs over distributions which are from the 
family of isotropic boxes |36l I37| that are non-contextual. 
This is a real simplification, since there are only 2 kinds 
of distributions p{\c) of the box from this family: 

(2) 

and Pqj^^. Hence our quantity reads 



(2) 

even 



X,(Pi?„) = min- 

a' 4 

{mo^P^L + (1 - ^)P^l\WPiVen + (1 - <^')P^1) 

+D{aP!,ll + {1- a)Pi'2j\a'P^ll -f (1 - a')P^^lj){ll) 

where a' is bounded by the fact that the distribution 
PMa' is non-contextual. To formalize this statement we 
denote by g' {Xc)i the probability of outcome i upon mea- 
suring context c of a box {g'(Ac)}. More precisely, any 

non-contextual box compatible with G^]^ has to satisfy 
the inequality which is equivalent to CHSH inequality 



1 < 



E E 

c 'iesupp(g(Ac)) 



5'(Ac). < 3 



(12) 



where g{Xc) are contexts of a PR-box and supp denotes 
the support of distribution g{\c) (see Appendix IE I. 



From this fact it follows that \ < a' < \ 



Next step 

is to observe, that relative entropy does not change un- 
der reversible operations such as bit-flip of an output of 
observable, (see lemma [?]), which gives: 



Xu{PRo 



mm 

i<a'<f 



D{aPS,L + (1 - o.)pI^^M'P^L + (1 - ^')P^1)- 



It is then easy to show, that for a > | there holds 



XuiPRc) = log( 



3"- 



h{a) 



(13) 



where h{a) = —a log a — (1 — Q:)log(l — a) is the bi- 
nary Shannon entropy. For a < j, Xu{PRa) equals the 
value of Xu{PR(i-a)) according to the above equation. 
On Fig. [2] we present values of measure X„ for chosen 

chain boxes C-ffa"'' (quantum ones provided in |41j and 
maximally contextual ones). 

Uniform relative entropy of contextuality for the Kly- 
achko box. In [25 , there is provided the qutrit state j^") 
and the set of projectors {Pi, ...jP^}, designed in a such 
way, that they violate the so called pentagram inequality. 
This setup corresponds to a hypergraph that is pentagon, 
so that projectors commute in pairs: Pi with P2, P2 with 

P3 in a circle so that the last commutation is P5 with Pi , 

(5) 

so the graph is Gq^j, but there are additional restrictions, 
since observables are one-dimensional projectors, so that 
their commutation implies their orthogonality. This im- 
plies that, e.g. the probability of the result 11 which cor- 
responds to obtaining an outcome 1 for both projectors 
as a result of measurement, is zero. More specifically, the 
Klyachko box is given by the same 5 distributions of the 
form 5(00) = 1 - 75, i?(01) = ^(10) = ^ and g(ll) = 0. 
This implies that there are less symmetries than in the 
xor-boxes. After corresponding twirling, which wc in- 
troduce in Appendix |I E 1[ there are 2 parameters left. 
Fortunately, we can make use of the fact that g(ll) = 
in different way: it means that the corresponding classi- 
cal probability distribution should have p(00) = for all 
contexts, since the formula minimizes over the classical 
probabilities. In turn, we obtain 



Xu{K) w 0.0467. 



(14) 



The quest for additivity of uniform relative entropy of 
contextuality. One of the most welcome properties of 
the measure would be its additivity. Thus we pass now 
to question whether X^ is additive and give here par- 
tial answers for the isotropic boxes. More formally, we 
say that the measure is k-copy additive on input B if 
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= kX{B) (see Definition [i] of tensor product of 
hypergraphs G and boxes on Cq'^). If is fc-copy ad- 
ditive for any natural k, we say that X„ is additive on 
B. We prove general theorem (Theoreni[8|, which in par- 
ticular implies that for families of isotropic boxes is 
2-copy additive. For boxes which are extremal within the 
family of isotropic xor-boxes (such as Ci?(„), PM, M) 
Xu is additive. We conjecture however, that proposed 
measure is additive for all isotropic boxes. 

The quest for monotonicity. We now prove monotonic- 
ity of Xu and Xmax under operations which lie within 
the set of non-contextuality preserving ones. Given a 
box B = {g{Xc)} € C'g*\ we define an operation A as a 
mixture with some probabilities pj of independent chan- 
nels A^-''' acting on observable i G {1, It is easy to 
see, that such defined set of operations is a convex subset 
of non-contextuality preserving operations. To see that 
Xu and Xmax are monotonous under these operations we 
need to use the fact that both measures can be expressed 
in terms of mutual information ([3]). Application of A is 
then a local action on variable A^, and monotonicity of 
both Xu and Xmax follows directly from data processing 
inequality [43.. 

Discussion. We have introduced two measures of con- 
textuality. One is the cost of contextuality and the other 
is relative entropy of contextuality. Both measures are 
faithful, i.e. are nonzero if and only if the box is con- 
textual, and have operational meaning. We have proven 
(partially) some other properties of these measures such 
as monotonicity and additivity in case of a variant of 
relative entropy of contextuality called uniform relative 
entropy of contextuality. To achieve this, we developed 
the symmetrization techniques - operations of twirling, 
which are interesting on its own. Due to additivity for 
extremal isotropic boxes, we can hope that the (uniform) 
relative entropy of contextuality is in general extensive 
measure. It would be desirable to prove full monotonic- 
ity of (uniform) relative entropy of contextuality under 
wirings and additivity which we conjecture to be 
true at least for all isotropic boxes. We have generalized 
notion of twirling introduced in context of non-locality 
to single-partite case, which enabled us to compute the 
value of the cost of contextuality and uniform relative 
entropy of contextuality for certain families of boxes in- 
variant under twirling (exemplary isotropic xor-boxes) 
and the Klyachko box. It would be interesting to calcu- 
late it for all isotropic xor-boxes and also non-isotropic 
boxes. 

Our approach can be developed in different ways. 
First, one can define analogous measures to X^ and Xmax 
setting variational distance in place of relative entropy. 
It seems, that analogous results can be obtained for 
these measures. Another interesting approach is to de- 
fine new measure specific to non-locality: in place of non- 
contextual boxes in definition of X^, and Xmax one can 
consider local boxes, which differs from these measures, 
since in larger bi- or multi-partite systems there are local 



boxes that are non-contextual. This measure was intro- 
duced and studied independently by F.G.S.L. Brandao, 
where it was shown to be the rate of transition between 
boxes under transformations using operations that pre- 
serve nonlocality [45 . While Xu reaches asymptotically 
for CH(^n) boxes, its non-local version seems to remain 
non-zero in asymptotic case ^?Tj . One can also consider a 
measure that have more communicational meaning than 
Xmax defined as min^^ maxp(c) -f(I]c-P(c)|c)(c| (8) A), i.e. 
with changed order of min and max in 0. This mea- 
sure may not be equal to analogous version of Xmax but 
has communicational meaning: it is minimal capacity of 
the channel from Sender to Receiver under Adversary's 
attack. 

Note, that another way of defining relative entropy of 
contextuality, would be to consider a quantity defined on 
a box B compatible with graph G as 

X*{B):= inf D{B\\Bnc) (15) 

where D denotes relative entropy of the boxes B and 
Bnc defined operationally via distinguishability of box 
B from box B^c in ^46J. It would be interesting to relate 
such defined measure with Xmax and Xu. Note also, 
that following [T3] it is easy to define and study notion of 
(anti)robustness of contextuality. This measure will be 
used in [47^. It would be also interesting to investigate 
possible connection between our measures and entropic 
tests of contextuality put forward in [55', (which have 
their roots in entropic Bell inequalities _50J). 

Finally, we note that our measures can be useful for 
description of experimental results as they are based on 
correlations between measurement outcomes rather than 
on mutual exclusiveness of observables. It is important, 
since in practice it is very difficult to satisfy the latter 
condition in experiment. 

I. APPENDIX 

In this section we show details of the presented results. 

A. Preliminaries 

We denote a hypergraph as G := {Vg,Eg) where 
Vg — {Ai, Afc} is a set of k observables and Eg being a 
set of contexts of the hypergraph, i.e. the set of subsets 
of mutually commensurable observables of Vg- A box 
has an input x with cardinality n equal to the number of 
edges of the hypergraph (number of contexts in a given 
G) and (for simplicity we assume) each output has the 
same cardinality d of dimension equal to multiplication 
of cardinalities of outputs of Ai which contribute in the 
corresponding context. The set of such boxes we denote 

(k) 

as Bq . We say that a box is compatible with a hyper- 
graph G if it is family of n probability distributions such 
that for each c = {A^^ , } G Eg, where |c| is the 
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power of the context c, there is a corresponding probabil- 
ity distribution in this family on r2(Aij) x ... x il{Ai^^^). 
We denote it as a family of distributions {P{a\xi)} and 
Xi € Eg- 

Definition 1 For a given hypergraph G = {Vg,Eg), 
is a consistent box if for all pairs c,c' € Eg, 
and for set of observables S ~ cH c' ^ $ there is 

V,, Pr{S = s,T ^t\x ^ c) = 
t 

= YPr{S ^ s.r ^t'\x^c') 
t' 

where T = c—S and T' = c' — S. The set of all consistent 
boxes compatible with hypergraph G that has n contexts 
is denoted as Cq^^ . 

Note, that the well known non-signaling condition is 
special case of such defined consistency. 

Definition 2 A non- contextual box associated with a hy- 
pergraph G is a consistent box with a property that there 
exists a common joint probability distribution for all the 
observables in Vg- The set of all such boxes compatible 
with G, we denote as NCg- All boxes that are consistent 
but do not satisfy this condition, we call contextual. 

Similarly as in the main text, to specify distributions 
that belong to box B G Cq'^ we will denote it as {g{Xc)} 
where c numbers the contexts running from 1 to rt. If 
it is not stated otherwise, in what follows we assume 
n > 3, since for n < 2 all boxes compatible with any 
hypergraph G, are non-contextual. If a box B is non- 
contextual, we denote it as {p{Xc)}, and by p{X) we will 
denote the joint probability distribution on Vg (which 
exists by definition of non-contextual box) of which p^s 
are appropriate marginals. For short, by p{X) G S for 
some set of boxes S we mean that non-contextual box 
defined by p{X) belongs to S where graph G with which 
this box is compatible should be understood from the 
context. We now make a trivial observation about these 
boxes: 

Observation 1 A consistent box on G — 
{{Ai, Ak}, Eg) is non- contextual iff it can be written 
as a convex combination of consistent deterministic 
boxes, i.e. such that the joint probability distribution of 
the outputs of all observables Ai,...,Ak equals (5ao.a for 
some fixed vector slq . 
Proof. 

It follows from the definition of noncontextual boxes: 
the joint probability distribution of all observables 
Ai^ Ak is a mixture of the deterministic ones.| 



B. Proof of equivalence 

In this section we present one of the main results 
of this work - equality the of optimal correlation fac- 
tor and the relative entropy of contextuality. In this 



and the next section, for the sake of proof, we will 
use also a quantity defined on box B = {g{Xc)} as 
^{p(c}}iB) := minp(A) X;c-P(c)-C(s'(Ac)||p(Ac)), which is 
a version of relative entropy of contextuality for fixed 
distribution {p{c)}. 

Theorem 1 For any box B = {giXc)} e C^"\ th ere 

holds Imax(B) = XmaxiB). 

Proof. 

To show the equality, we introduce another measure of 
contextuality /^q^,, 

^maxiB) — 



{p{c)} {9(>|c):s(Ae|c)=g(A,)},p(A) ^ 



(16) 



and prove XmaxiB) = I'maxiB) = Imax{B). The 
proof will not involve optimality of distribution 
p{c) over which in all quantities we take supre- 
mum, so we show the equality for I'{p(^c)} 
min{g(A|c):ff(Ae|c)=g(Ae)},p(A) EcP(c)£'(.g(A|c)||p(A)) and 
from which desired equality follows. We then 



{p(c)}: 



fix p{c) and B G Gq"^ arbitrarily from now on, and 



show that Xs 



4p(c)} = I'{p{c)} = hp(c)}- We prove now 
the first of these equalities. It is easy to see that 
^{p(c)}(-^) — -^{p{c)}{B) since relative entropy does 
not increase under partial trace. To see the converse 
inequality, consider the optimal classical probability in 



X 



{p(c)}: 



call it p*{X) (see Appendix IC for the proof. 



that such p* (A) exists) with marginals p" (Ac) , then find a 
conditional probability distributions p*{X'^\Xc) such that 
p* {X'^\Xc)p* {Xc) = P*{X), where A = A^Ac, and define 
g{X\c) — p* {X'J^Xc)g{X(?) . It is easy to check, that such a 
choice saturates the inequality I'{p{c)}{B) > X{p(c)}(i?) 
giving equality. 

To see that /{p(f,)}(i3) = I'{p{c)}{B), we use the follow- 
ing fact: 

/(^p(c)|c)(c|®A)^ 

C 

Y,p{c)D{g{X\c)\\Y,p{c)g{X\c)) = 

C C 

= mmJ2p{c)Dig{X\c)\\p{X)), (17) 

P(A) ^ 

where Ac has distribution g{X\c), which is proven in 
lemma [2] below, stated in more general - quantum case 
(where in place of g{X\c) there is a quantum state and 
minimization is over some states cr). If we set minimiza- 
tion over g{X\c) having marginals g{Xc) of a box B, we 
get desired equality. 

Summarizing the results we get /{p(c)}(i?) — 
I{p{c)}iB) = X{p(c)}iB) for arbitrary p{c) and B, 
hence taking supremum over this distribution proves 
Imax{B) = Xmax{B) for arbitrary consistent box B.f 
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Before proving equality (17), we need another result, 



stated in the lemma below. We need it only for random 
variables, but we state it for quantum states, since it is 
valid for quantum states in general, and use the fact that 
quantum relative entropy and relative entropy distance 
coincide for classical distributions: 

Lemma 1 For a quantum state p with subsystems 
TtbP = Pa and TrAP = Pb 



inf S{p\\aA® (Jb) = S{p\\pA® Pb) 



(18) 



where Tta (Trg) denotes the partial trace over system 
A (B), and S is quantum relative entropy distance fjj) /. 
Proof. 

We first note, that \og[aA ® <^b)) — (logcTA) ® Ib + 
I A® (logcs); where I a and Ib are identity operators on 
systems A and B respectively. Thus 

S{p\\cfA ® <tb) = ~S{p) - Trp log(CT^ as) = 
- S{p) + S{pa) + [-S{pa) - TrpA log cta] + 
S{pb) + [~S{pb) -TrpB log ctb] = 

I{p) + S{pa\\(7a) + S{pb\\(tb) (19) 

Where I{p) is quantum mutual information [51] . The last 
equality proves that S{p\\<ta ® ctb) > I{p) because the 
relative entropy terms .^(pyil |cr^) and S{pb\\(^b) are non- 
negative, but S'(p||pA ® pb) — lip), hence the equality.| 
We prove now the lemma needed in proof of theorem 
[ij We state it again for quantum states, since it is valid 
not only for probability distributions: 

Lemma 2 For arbitrary ensemble of quantum states 
{p{c)^Pc}, there holds 

I {J2p{c)\c){c\®p,)= inf Y,Pi^)S{Pc\\'j)- (20) 



Proof. 

Let us note that LHS can be rewritten as 

^(EcP(c)|c)(c| ® Pc\\iEcPic)\c){c\) ® (EcP(c)Pc)). 

Then, we use the fact that denoting J2cPi^)\^)(^\ ^ Pc 
as /O, by lemma [T] we have 

5(p||(^p(c)|c)(c|)®(^p(c)p,)) = 

C C 

inf S{p\\aA®cfB)- 

Knowing that X)cP('^)l'')("^l' i-^- subsystem of p, is 
the best a a in the above minimization, we can fix it, 
having 

S{p\\iJ2p{^)\c){c\)®{Y,pic)p.)) = 

C C 

inf5(p||(^p(c)|c)(c|)®a) (21) 

c 

It is then easy to check that the RHS of above equals just 
info- X]c?'('')'^(Pc| Ic), and the assertion follows.| 



Direct sum of boxes. Xu and X„ 
equal. 



are not 



As it was mentioned, X^^ax ^ ^u- In this section we 
prove that these two measures are different. More pre- 
cisely, we find their values on direct sum of two boxes, in 
terms of their values of the boxes themselves in theorem 
[2] Using this result, we show among others that these 
measures differ on any direct sum of contextual box and 
non-contextual one in corollary [2j To begin with, we 
show that in formula [6] we can indeed write minimum 
instead of infimum. We will show more, proving that 



in definition of X 



{P(c)} 



one can also consider minimum. 



Taking than uniform distribution, we get the thesis for 

Xu- 

Recall that for a box B — {g{Xc)}, 



X{p{c)}{B) - 



{pW} 



c<£Eg 



p{c)D{g{XMp{K)) (22) 



which depends on both the box B and probability dis- 
tribution over contexts {pc}c- Recall here also, that the 
minimization is taken over all joint probability distribu- 
tions p{X) defined on outputs of observables, and p{Xc) 
are marginals, restricted to the context c. This quantity 
can be written as (using quantum notation for classical 
distributions) 



X{p{c)}iB) = inf 5(p|ct) 



(23) 



where p = I]c-Pc|c)(c| (g)pc and a = X]c-Pc|c) (c| (?) Pc with 
(Tc representing probability distributions p{Xc), and pc - 
the distributions g{Xc) of the box B. One finds that the 
set of states a is convex and compact (note, that {pc} 
is fixed here). Therefore, since relative entropy is lower 
semicontinuous [52 , there exists state a* , which achieves 
the infimum. This finishes the proof. 

We now introduce definition of direct sum of hyper- 
graphs and boxes. 

Definition 3 For two hypergraphs Gi = {Vgi,Eg-^) 
and G2 = {Vg2, Eg^), a direct sum of Gi and G2 is 
Gi © G2 := (VGieG2> ^^GiffiGs) with Vg,©g2 = U Vg^ 
and Egi<sG2 = Eg^ U £'g2 • E^r any two boxes Bi = 
{9{^c)}cfEEGi and B2 = {g{Xc')}c'eEG2 compatible with 
hypergraphs Gi and G2 respectively, their direct sum is a 

box Bi ® B2 := {5(Ac)}cg-Egi®g2 • 

In the next part of this section, we use the following 
notation. By p(A)[V^] we mean any joint probability dis- 
tribution of the outputs of observables from set V, and 
by p(A)|v the marginal probability distribution of p(A), 
defined on the outputs of observables of set V. Moreover, 
by D{g{Xc)\\p{Xc)) \p(x) we mean the relative entropy dis- 
tance between distribution of the output of variables from 
context c of box {5 (Ac)} and that from context c of non- 
contextual box defined by distribution p{X). 

We will now need a lemma, which simplifies computa- 
tion of Xu and Xmax of direct sum of boxes, as it states, 
that it is enough to take minimization in both quantities 
only over product distributions. 
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Lemma 3 For any two hypergraphs Gi = (Vcn Eq-^) 
and G2 = {Vg2iEq^) and boxes Bi £ G^^^ and B2 £ 



cl?^^ there holds: 

<J2 



mm 

p{\)lVG,]pWiyG2] ^J-^ ni+n2 



E 



1 



D{g{X,)\\piX,)), 

(24) 



X,nax{Bi (B B2) = 

{pic)}pWlVG,]pWlVc,] ^^^^^^^ 



(25) 



Proof. 

To see both equalities we observe that for any distri- 
bution {p{c)} and any distribution p{X)[Vgi®G2]j there 
holds 



p(A)[VGieG2] - 

P(-^)[Vgi®G2]|vgj 
P(^)[^GieG2]lvG2 



^ p(c)i?(g(A,)|b(A,)) 
= J2 p(c)i?(5(Ac)lb(Ac)) 
+ J2 Pic)D{9{X,)\\piX,)) 

cSEGa 

= E ^'(^) 

cGBgiUBgs 

^(5(Ac)||p(Ac))|p(A)[yGieG2llvG^p(A)[yGi©G2]|vG, ■ (26) 

This is because by definition of Bi © B2 contexts from 
depend only on variables from Vd , similarly as con- 
texts from Eq^ depend only on ■ Hence X{p(c)} {Bi © 
B2) = minp(A)[yGi]p(A)[yGjEceBGiU£;G2 ^(3(Ac)lb(Ac)), 



To see both the above statements, we observe that for 
a distribution {p(c)} such that w = X^cs-Eg P^'^^ ^ ^ 
and w ^ 1, and for any two distributions p(A)[Vgi] and 
p(A)[Vg2], we have 



J2 p{c)DigiX,)\\piX,)) 



p(A)[i/Gi]p(A)[yG 



w 



^ Pi^D{g{XM^c)) 

^ — ^ in 



p(A)[Vgi] 



(1 



cSBgi 

^) E r^^(5(Ac)lb(A.)) 

ceEc, 



p(A)[Vg2 



(29) 



This immediately gives: 



mm 

p(A)[VGi]p(A)[yG. 



, E p(^) 

ce_EGiU£;G2 



-C(5(Ac)lb(Ac)) |p(a)[\/gi]p(a)[Vg2] = 
^ ^Z?(.9(A,)|b(A,))- 



iz; mm 

p(A)[Vgi 



cS-Egi 



+ (1 — u>) min 

p(A)[Vg, 



E ^^(5(Ac)lb(A.)). (30) 



Substituting p{c) 
tain by lemma |3] that 

X„(Bi ® B2) 
"1 I 



ni+n2 



in the above equation, we ob- 



^ J2 DigiX^piX,))]- 



c6Bgi 



n2 



-L.T,? E ^(5(Ac)|b(Ac))] 

n2 p(A)[yG2] n2 ^ 



ni 



ni -I- n2 



ceSG2 
-^u(Si) 



"2 



^u(S2), (31) 



for all c implies ( 24 1 . Taking which proves the statement ( 27 ) 



which for p(c) 

supremum over {p(c)}, we obtain (25). 

We are ready to show our main tool, interesting on its 
own, which is the following theorem that expresses 
and Xmax of a direct sum of two boxes in terms of these 
fimctions of these boxes. 

Theorem 2 For any two hypergraphs G\ and G2 and 
boxes Bi € G^^'^ and B2 £ Gq^^^ there holds: 



X^{Bi®B2) 
and 



ni 



Hi + n2 



Xu{Bi 



"2 



ni + n2 



Xu{B2), (27) 



Xmax{Bi ® B2) = max{X 

'max max (S2)}. (28) 



Proof. 



We now pass to prove the statement ( |28[ ). We can 
assume w.l.g. that X^axiBi) > X^ax(B2). By def- 
inition of Xmax, for any Sk > 0, there exists {pk{c)} 
such that X^ax{Bi®B2) < X{p,(c)} (^i © ^2) + 4- We 
will argue now, that for any k and decreasing Sk, there 

holds (-Si ©^2) < ^a.x{Xmax{Bl), X^ax{B2)}+Sk, 

which will prove desired upper bound in limit fc — > 00. 
The proof then follows from the fact, that this upper 
bound can be attained by taking Wk = X^csSg P^i'^) — 1 
and such {pfc(c)} that attain supremum in Xmax{Bi) in 
limit of large k. 

We need to consider only two cases: {pk{c)} is such 
that Wk — I (case 1) or < Wfc < 1 (case 2). In the 
first case we have Xjnax{Bi © B2) < -'^{pfc(c)} (^1) + Sk< 

XmaxiBl)+6k < meix{Xmax{Bl),XmaxiB2)} + Sk wllich 

we aimed to prove. Thus, it is enough to show that in 
the second case Xmax{Bi O B2) is upper bounded by 
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Xmax{Bi) + 5k, since then the first case yields optimal 
value of Xmax{B\ ® i?2)- Suppose then, that {pk{c)} 
satisfies < = X^cs-Eg This implies that 

{^S^ice-Eci and {^^^IceBcs are vahd distributions, 
hence from ( |30[ ), by lemma [Sj we have 



Xrnax{Bl (B B2) < 

= WkX p^(a), {Bi)+{l-Wk)X, p^(c) -, (i32)+4- 



(32) 

By definition of Xmax we have X,pj.(c), (^i) ^ 
X,nax{Bi) and X p^(e) (B2) < ^maa;(^2) which 

gives from the above equality 

Xmax{Bl (B B2) < 
< WkX 

max 

{Bi) + {l-Wk)X 

'max {B2) + 5k < 

< X^ax{Bi) + dk = max{X 

max 

{Bi),X 

m,ax {B2)} + 5k, 

(33) 

hence, as we explained, the assertion follows. 1 

The above theorem can be easily generalized to any 
finite direct sum of boxes, giving that X„ is the average 
value of the X„ of particular boxes from the direct sum 
(with weights according to cardinality of their number of 
contexts), and X^ax is the maximal value of X^^ax on 
particular boxes. We can state now the main application 
of this theorem. 

Corollary 1 For any two hypergraphs Gi and G2, and 

a boxes Bi G Cq^^ and B2 £ Cq^^ with rii, n2 > 1, such 
that Xu{Bi) ^ X„(B2); there holds 



Xu{Bi®B2) <X 

max (Bi ® S2). 



(34) 



Proof. 

Since Xu{Bi) ^ X„(_B2), we can w.l.g. assume 
Xu{Bi) > XuiB2). This implies, by theorem |2] 

XuiBi (S B2) = XuiBi) + Xu{B2) < 

ni +n2 ni+ 712 

Xu{Bi) < X„iax{Bi) < ma.x{Xmax{Bl), Xjnax{B2)} = 

^ Xraax{Bi®B2), (35) 

which proves the corollary.| 

From the above corollary we obtain immediately an- 
other one: 

Corollary 2 For any two hypergraphs Gi and G2, a con- 
textual box B e Cgj^' CLnd a non- contextual box Bnc G 
C^Q^^ with ni,n2 > 1, there holds 



XuiB®Bnc) 



Hi 



, -Xu{B) < 
Hi + n2 

Xmax{B © Bnc) = Xfnax{B). (36) 



Proof. 

It is enough to observe, that Xu is faithful, hence 
Xu{B) > Xu{Bnc) and the corollary [l] applies. | 

The above corollary states that X^ and Xmax dif- 
fer on certain direct sums of boxes. Exemplary can be 
PR® PRi, since PRi is maximally mixed box, which is 

2 2 ^ 

clearly non-contextual and PR is contextual. There are 
also quantum boxes, i.e. that originate from performing 
certain measurements on a quantum state. Exemplary is 
defined as follows: 

Consider the maximally entangled state l^*^) = 
^(|0)i|l)2 — |l)i|0)2) and consider the hypergraph 
with G := {Vg,Eg) where Vg - {^1,^1, 
^2^, R{Xt) ® R{X2), R{Zi) ® R{Z2) } and Eg := 
{{Xi, {Xi, ^a^}, {Zi, {Z,, ^^}, 

{R{Xi) (g) R{X2), R{Zi) ® R{Z2)}}. X and Z are PauH 
matrices and i?(.) represents the rotation of Pauli 
matrix around y axis by angle 7r/8. We then consider 
a box B obtained via measuring observables from Vg 
on the state Ivf") in groups defined by contexts. It is 
easy to see, that this box is a direct sum of the most 
non-local quantum box with two binary inputs and two 
binary outputs defined by first 4 observables and first 4 
contexts, and a box with a single context, which is by 
definition non-contextual. 



D. Twirling and isotropic boxes. Simplifying 
computation of 

In order to compute X^ for the isotropic xor-boxes 
(see example ([T])) and the Klyachko box, we first observe 
that these boxes have numerous symmetries, i.e. they are 
invariant under some non-contextuality preserving oper- 
ations. In this paragraph we specify groups of such oper- 
ations and a map which applies them at random, called 
twirling. This leads us to the definition of isotropic boxes 
and the main result of this section (theorem |4| which 
shows that for these boxes it is enough to minimize in 
the definition of X^ only over non-contextual isotropic 
boxes. 

To be more precise, consider any hypergraph G with 

n contexts and a box B £ C^cP . A non-contextuality 
preserving operation satisfying L{B) = B we call non- 
contextuality preserving automorphism of B. For any fi- 
nite set of non-contextuality preserving automorphisms 
£, if the group generated by the set C (denoted as Gc) 
is finite of order | Gc \ , then the map defined on B as 



B 



E 



1 



KB), 



(37) 



we call B-C-twirling and denote as r^. The image of the 
set of all boxes through B-£-twirling we call the set of 
B-C-isotropic states: 



(") 

G 



Fee, 



]iF)}. (38) 
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Note, that there may be different twirhngs depending 
on the set of generators £ of Gc- However, when the 
results are true for any fixed choice of £, or the set C 
is known from the context, we will omit it in notation, 
denoting the introduced objects as B-twirling {tb), and 
a set of B-isotropic boxes (Is). 

We observe that to find the set of B-isotropic boxes 
we need not to apply r^. By theorem [sj which we prove 
below, the set T is equal to the set of boxes invariant 
under elements of C This theorem is true for any subset 
of linear space, but for clarity, we state it for the set of 
consistent boxes. 

Theorem 3 For a hypergraph G and the set of con- 
in) 

sistent boxes Cq compatible with this graph, let J- be 



a finite group of linear maps L 



^ C, 



(n) 



% ~ {hi, hn\ Q J- a subset of its elements such that 
each of them have its inverse h~^ in T . Let us define a 
family of boxes B invariant under transformations hi : 



T:={Be C^"^ : V, hi{B) ^ B}, 



(39) 



and a subgroup T-h C T generated by %. We then have 
the following: 



Imr^{G^^^) 
{D e : 



Proof. 



Bee, 



G 



y — 



f{B) = D} = T. (40) 



Let B e Im^„(C^"^). Then for each i we have: 

h,iB) = K{ ^ T^jm (41) 



(42) 



E 



f=h,ofeJ^H 



rf{B) = B, (43) 



where in first step we use linearity of the maps hi and in 
the last we use the fact that / runs through the whole 
group J-'f{ since each hi has its inverse. From the above 
we see that B €T, and so Imjr„ (Cq^) C T. 

On the other hand, for each box B G T we have: 



1 



f{B), 



(44) 



because for all / e J^fi f = hi^o ... o hi^ [hi^ £ Ti), and 
so f{B) = B, from which we arrive at Eq.(44). Thus, 

we showed that T Q Imjr„(CQ"'') which, jointly with the 
opposite inclusion, proves the theorem. | 

Consider now specific set of non-contcxtuality preserv- 
ing automorphisms Cq which is any set of compositions 
of two types of linear maps: (i) tt^ - permutations of 



observables, and (ii) bi - negations of outputs of observ- 
ables. For this set we have general theorem which allows 
for easier evaluating the relative entropy of contextuality. 

(71) 

Theorem 4 For any box B e Cq and a set of B-Cq- 
isotropic boxes Tg° we have: 

X„(B)= min ^-i^(g(A,)|b(A,)), (45) 

where the minimum is taken over all probability distribu- 
tions p{X) which give rise to non- contextual box from the 
set of B-Co-isotropic boxes Ig° . 
Proof. 

Let p{X) be optimal for Xu{B), and denote the non- 
contextual box defined by this distribution as Bnc- Be- 
cause of the choice of Cq, for any element / in group Gco 
generated by this set, there is 



X^iB)^Y.-^(9fiK)\\PfiXc)), 



(46) 



where gf{Xc) and pf{Xc) are distributions of context c of 
a box f{B) and a box f{Bnc) respectively. To see this, we 
note that, by definition of £oj / is a composition of per- 
mutation of observables and bit-flips of their outputs. It 
is then enough to prove separately that the above equal- 
ity holds, for / being one of them. Consider flrst / to 
be a permutation of observables. Since f{B) = B, it 
is also an automorphism of G with which B is compati- 
ble, hence it is special permutation of observables which 
induces permutation of the contexts and in turn of ele- 
ments D{g{Xc)\p{Xc)). It means that applying / induces 
just change of the order of summation in the definition 
of X„. Second, if / is a bit-flip, since it is applied to 
both g{Xc) and p(Ac), it does not change the relative en- 
tropy which is invariant under doubly applied reversible 
operations [HI]. Thus we have: 



x^{B)^j2- E r^^^(5/(Ac)ib/(A.))> 



(47) 



where in the second line we used the joint convexity of 
relative entropy. What we obtain is the fact that such 
process of symmetrization cannot increase the relative 
entropy. What is more, since / is an automorphism of 
B, we have that for each context c: 



(48) 



We observe now, that since rS" preserves non- 



contextuality, the box Tg° {Bnc) has a context c equal to 
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^/e^c \J^c \ PS^^<^^ ^^"^ ^ non-contextual box. Since 
Bnr is optimal for Xu and, when we substitute the box 



-'nc 



Tg" {Bnc) in place of i?„ci we cannot increase the quantity 
Xu due to inequality (471, the box Tg°{Bnc) must also 
be optimal for Xu, which proves desired equality in (47 1. 
We have t^°{B„ 



G Tg° hence the assertion follows. i 



E. Computing for the exemplary isotropic 
xor- boxes 



In this section we specify twirling operations £o for 
the xor-boxes, hence showing that one can obtain the 
isotropic xor-boxes introduced in ([T]) by operations that 
are non-contextuality preserving. This is crucial, since 
then we can use Theorem |4] to compute Xu for these 
boxes, which is done in Theorem [Tj 

To begin with, we introduce twirling for PM box by 
specifying the set £o which leads to one-parameter family 
of isotropic boxes. 

• For PM box we choose Cq — {/ii, /igli where the 
elements hi are in general compositions of maps (i) 
and (ii): /ii, /ig - are 3! permutations of contexts 

{Ai,A2,A3},{^4,A5,A6},M7,^8,^} 1-6. the 

rows on Fig. [Tcj /17 - a swap of {^1,^4, Ay} and 
{A2, A5, Ag}, i.e. a swap of two solid columns on 
Fig. [Tcj /ig - is a composition of permutation de- 
fined by mappings: O A2, O A3, ^7 O Ag 
(the rest of the observables are mapped to them- 
selves), composed with a bit-flip of the output of 
observable Ag. This operation is a reflection of the 
hypergraph w.r.t. to the diagonal with appropriate 
The set I'^li we call the set of 
The reason for this is stated 



Ic 



bit-flip, on Fig 
isotropic PM boxes 
in lemma below: 

Lemma 4 There holds: 



-PM 



{aPM + (1 - a)PM'\a £ [0, 1]}, 



(49) 



where PM' is an opposite version of the box PM, i.e. 

(3) (3) 

PM with Peven exchanged with P^^^ and vice versa. 
Proof. 

To see the above statement, we will use Theorem |3] 
Due to this theorem it is enough to argue that invariance 
of a box under Co implies that it belongs to ■ In the 



proof we will refer to Fig. Ic First, due to invariance un- 
der /ii, /ig the rows need to have the same probability 
distribution. Second, using /ig we obtain that middle row 
and middle column has the same distributions. Third, by 
/17 we get that all solid lines has the same distributions 
with 8 probabilities q{ijk) of string (ijk) where i,j,k 
are binary. Due to invariance under /ii, ...,/i6, both the 
solid columns and the dashed column are permutation- 
ally symmetric, i.e. are described only by g(OOO), g(OOl), 
(j(Oll) and g(lll) (r(OOO), r(OOl), r(Oll) and r(lll) for 
dashed column). Invariance under operation /ig imposes 
q(OOO) = r(OOl) and (7(011) = r(OlO) which equalizes 



q(OOO) and g(Oll) because of r(OOl) = r(OlO). Thus 
q(OOO) = a/4 for some parameter a e [0,1]. Similarly, 
we have g(lll) = r(llO), q(010) = r(Oll), which implies 
that (?(111) = g(OlO) = (1 - a)/4. Exchanging p with 
q, we get that also r(OOO) = r(Oll) = (1 - a)/4 and 
r(lll) = r(Oll) = a/4, which ends the proof.i 

The argument given in the above lemma is analogous 
in the case of other xor-boxes considered in this paper, 
where in particular we have: 

• For M box we choose Cq — {hi, ...,hio}, where: 
hi - reflection of the star with respect to the Aa 
symmetry line, /i2 - reflection of the star with re- 
spect to the Cc symmetry line with bit flip on the 
node c, /13 - reflection of the star with respect to 
the Dd symmetry line with bit flip on the node d, 
hi - reflection of the star with respect to the Ee 
symmetry line with bit flip on the node E, /15 - 
reflection of the star with respect to the Bh sym- 
metry line with bit flip on the node B, /ig-io - bit 
flips on three nodes that form any triangle on the 
hypergraph {Acd, Bde, etc.). For such defined Cq 
there holds: 

Zf; =:{aAf +(l-a)M'|ae [0, 1]}, (50) 
and the set of these boxes we call isotropic M boxes. 



• For CH, 

{hi,... 



(n) 



box we choose Co — 
-i}, where: hj is a composi- 



tion of cyclic permutation of contexts such that 
aU {A^,Ai+i} — > {Ai+j,Ai+j+i} with bit flips 
on the observables Ai,...,Aj. For such defined Cq 
there holds: 

^c^(") = + (1 - «)Ci/('„)|« e [0, 1]}, (51) 

and the set of these boxes we call isotropic Ci/(„) 
boxes. 

Let us now fix a contextual box B — {g{Xc)} and de- 
note by g{Xc)i the outcome i of distribution g{Xc) under 
a measurement on the box B the context c. For a box 
B — {g{Xc)} compatible with the same hypergraph G as 
B, we define the quantity which measures how con- 
textual is box B w.r.t. to box B: 



(52) 



c iGsupp(3(Ae)) 



where g{Xc) are probabilities of outcomes within a given 
context, and supp{g(\c)) is the support of the distribu- 
tion g{\c), i-e. the set of the outcomes of a measurement 
of the context c which have nonzero probability in distri- 
bution g{Xc). 

We will need some properties of /3_b(.), which are col- 
lected in the lemma below, where we treat boxes as vec- 
tors of probabilities. 
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(n) 

Lemma 5 For any box B G and a xor-box B E 

(n) 

Cq with all contexts of the same cardinality m and all 
observables of the same cardinality 2, there holds: 

/3b(B) = 2(™-i)(B|B), (53) 

where (.|.) is Euclidean scalar product of vectors. More- 
over, for a twirling Tg° there holds: 



(54) 



Proof. 

The first statement is easy, as 2^'"^^^_B is a vector of I's 
for probabilities for which g{Xc) > where B — {g(Ac)}, 
hence the scalar product sums the probabilities of box B 
from the support of box B. To see the next consider the 
following chain of equalities: 

/3B(rf"(S))^2("-i)( 5] ^/(S)|S) = 



J2 ^2(™-i)(/(B)|i?) = 
E ^2("-i)(/(B)|/(B)) = 



E ^2(™-i)(B|B)=/3s(S) 



where in the second equality we use linearity of scalar 
product, in the third we use the fact that by definition 
of twirling / is an automorphism of B and in the fourth 
we use the fact, that each / is a composition of elements 
from Cq, i.e. permutations of observables and bit flips 
of outputs, hence it is a permutation, which does not 
change the scalar product. | 

Based on /3b we can build naturally a contextuality 
inequality, which for PM box is equivalent to that given 
in [21] , for PR box that given in [53] and for C7J(„) box 
that from [41] (see also [50]). 

(n) 

Theorem 5 For a xor-box B e Cq with a single con- 
text with distribution Podd, such that each vertex from 
Vq belongs to even number of contexts and for a non- 



(n) 

contextual box B S Cq , there holds 
pB{B)<n~l, 



(55) 



and the bound is tight. 
Proof. 

In what follows, we generalize the argument of N.D. 
Mermin [T5], with the use of which He proved that M 
box is contextual. Since any noncontextual box is a mix- 
ture of deterministic boxes, and by lemma [sj I3b{B) is 
linear, it suffices to prove the above inequality for de- 
terministic ones. Surely, deterministic boxes can attain 



only discrete values of LHS. Suppose then, that for non- 
contextual box LHS = n, i.e. all constraints of a contex- 
tual box are satisfied, meaning that for n — 1 contexts 
the sum of outputs = (even) and for 1 context 

0j — 1 (odd), which gives a total sum over all con- 
texts 1. On the other hand, for deterministic assignment, 
summing all the values for the whole hypergraph we get 
UiQi = since each (the number of contexts to 
which the observable Ai belongs to) is an even number 
by the assumption. This gives desired contradiction. The 
value of RHS can be attained deterministically, e.g. by 
putting all the outcomes equal 0, which simultaneously 
tighten the inequality.| 

Theorem 6 For a xor- box B G C^Q^ with even n ana, 
a simple context with distribution Podd, such that each 
vertex from Vq belongs to even number of contexts and 



for a non- contextual B E C. 



in) 
G 



we have: 



MB) > 1. 



(56) 



Moreover if the number of vertices in each context is odd 
then the bound is tight. 
Proof. 

The argument is analogous to the proof of Theorem 
[5] Again, we only need to consider deterministic assign- 
ments. Suppose there is a deterministic assignment of 
outcomes with LHS = 0. Then the box would satisfy all 
the constraints of contextual opposite version of a box B. 
For this box, n—1 contexts has the sum of outputs equal 
to ■ Oi — 1 {odd) and for 1 context ^ = {even) , 
which gives a total sum over all contexts 1. This how- 
ever is in contradiction with the fact that the sum over 
all vertices is since each vertex appears an even number 
of times in the sum. Hence LHS > 1. To see the tight- 
ness in a special case, we observe that setting each vertex 
value 1 constitutes a deterministic assignment that has 
/3b equal to 1. Indeed, since each context has an odd 
number of vertices, each edge has distribution Podd, and 
exactly one of them is in accordance with box B.f 

We note, that the assumption about evenness of n in 
the above theorem is necessary: 

Observation 2 For B E {M, Ci/(„)} with odd n, there 
exists a non- contextual box B such that /3b{B) = 0. 
Proof. 

There exists a deterministic assignment which sets /3b 
to zero: first, we set all observables to 1, and then change 
into fc of those which does not belong to context which 
has Podd in B, but such that each belong to disjoint pair of 
contexts. Such an assignment creates an opposite version 
of a box B, hence giving f3B{B) — 0.| 

As we have seen, all examples of sets of isotropic xor- 
boxes considered so far are one parameter. We now 
prove the lemma which bounds this parameter for non- 
contextual boxes. 

Lemma 6 For a non- contextual box Ba E I = Ip\[ U 
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I^j U 1^%^ J with n contexts, there holds: 

n-l 
a < . 



For even n there holds additionally: 

1 



a > 



while for odd n there is 



a > 0, 



(57) 



(58) 



(59) 



and the bounds are tight. 
Proof. 

By the definition of /3b, for any xor-box B we have 
/3b {B') — where B' is an opposite version of box B 

(with Pe^n in place of Pj,^J and vice versa) . This implies 
that for any isotropic xor-box B^ G I we have 



(60) 



and in particular, for non-contextual isotropic boxes 
S„ Glby Theorem [5] we have 



a < 



(61) 



Proof. 

The proof boils down to the observation that relative 
entropy is invariant under bilateral reversible operations 

Ml 

We can state the main theorem of this section: 

Theorem 7 For B^ G FA/„ U CH^a^ U Ma with n > 3 
number of contexts and a > ^ there holds 

X„(B„)-log[(n-l)-"n]-/i(a), (64) 
while for even n and a < - there holds 

Xu{Ba) = log[(n - l)("-i)n] - h{a), (65) 

where h{a) — —a log a ~ (1 — a) log(l — a) . 
Proof. 

We first note, that in both cases we want to consider, 
the isotropic boxes satisfy assumptions of Theorem [4j 
hence we need to optimize only over appropriate isotropic 
xor-boxes: 



X„(S„)= min y^-D{g{XMp{K)), (66) 



1 



To prove the second inequality, we observe, that PM and 
CH(n) for even n satisfies the assumption of Theorem l6l 
which gives the inequality ( 58 1 in analogous way. The 



last inequality follows from the dependence ( 60 ) and ob- 
servation [5] To see that the boundary values of a are 
attained by non-contextual boxes, we first observe that 
by theorem [Hj there exists a non-contextua l bo x B with 
I3b{B) = n—\. Now by lemma 4 equalities (50) and (51) 
after twirling t^" B belongs to Ig" , hence it has a form 
— a)B' where B' is opposite version of B. By 
Pb{B) = /3b(t|°(B)) = n - 1. Now, by equa- 
tion (|60|), we have I3b{t^° {B)) = na proving that t^°{B) 
attains the value ^^^^ of a. This box is clearly non- 
contextual, since is application at random some per- 
mutation of observables composed with bit-flips on out- 
puts of observables, hence preserving non-contextuality. 
Analogous argument, by use of theorem [6] and observa- 
tion [2] proves the tightness of the bounds ( [58| and ( 60 ) 
respectively. | 

We will now need another fact that apart from Theo- 
rem [4j simplifies computation of . 



Lemma 7 Let B = {g{\c)} G C, 



(n) 



X^{B) = ^ mf ii?(5(A,)b(A,)), 



(62) 



for some S C NCq. If for any p{X) G S there exist 
reversible operations lie satisfying Ilc{p{Xc)) ~ p{^co) for 
all contexts c and simultaneously Ilc{g{Xc)) — g{Xco) for 
all contexts c, then 

X^{B)^ inf Z?(5(AeJ|p(AeJ). (63) 

p{X)es 



where for short by p(A) G I^" we mean some isotropic 
non-contextual box B^a which is defined by distribution 
MA). 

Since Ba^ has only two kinds of distributions p(Ac), 
Pf„°,„ = aoPe^en + (1 " ao)Podd and P^°^ = a^Podd + 
(1 — aQ)Peven we Can write: 

Xu{Ba) = 



min-((n-l)D(P° 

ao n 



\Peven) + ^ {PoddW^odd)) ^ 



where ag is bounded such that Ba^ is noncontextual, and 



Peven and P^^^ are defined analogously to PfJ'g„ and P"^'^, 
respectively. Now, it is easy to check that if 5* = for 

Ba G PMaUMaUCH^\ the assumption of the lemmaj?] 
are satisfied with He being identity operations for every c 
for n—1 contexts for which Ba has the same distribution 
and bit-flip on one of the observables on a distribution of 
the remaining context, giving: 



Xu{Ba) 



min D{P^^ 



even 1 1 even 



(67) 



where ap is bounded by the fact that P^^g^ is a distri- 
bution of a non-contextual box which is isotropic. More 
specifically, it is bounded according to lemma [6j i.e. we 



have ao < 



It is easy to check that for a > ao 



the function (67 1 is decreasing with ao- Lemma [6] shows 



that the boundary value ao 



_ (n-l) 



is attained by non- 



contextual isotropic xor-box, hence the function attains 
minimum for this value of ao, which proves (64). For 



< ao, this function is increasing, and again by lem ma 
|6] attains minimal value at ao = ^ , which proves ( 65 1 . | 



We note here, that due to the above theorem, Xu{Ba) 
for even n, a > '■"""'"^ equals the value of X^iBa') for 
a' = 1 — a, in correspondence with the fact that Ba can 
be changed by bit-flips into -Bi-q which does not change 
the relative entropy distance. 

In particular, for considered examples of xor-boxes i.e. 
in the case a = 1 we have: 

X„ (Pi?) = log ^ « 0.4150, (68) 

XJPM) = log - w 0.2630, (69) 
5 

X„(M) = log ^ « 0.3219, (70) 

ft 

X„(Ci/(„))=log -. (71) 

According to the above formula, X^ tends to zero in 
asymptotic limit for maximally contextual chain boxes. 
Interestingly, if we do not take average over num- 
ber of contexts, i.e. consider a measure X{B) := 
X;c^(g(Ac)|b(Ac)), it will equal to nlog(l + ;^) and 
tend asymptotically to log2 e where e is the Euler num- 
ber. In other words, if we consider natural logarithm in 
definition of relative entropy, X tends to 1 with increas- 
ing n. It means that, although "average" contextuality 
of chain box - per number of contexts - vanishes with 
increasing n, the "total" contextuality is bounded by 1 
from below. Remarkably, the same result holds for quan- 
tum maximally contextual chain boxes: X{CH^^) based 
on natural logarithm tends to 1 for both odd and even n 
where a for each n is given in description of Fig [2] 

For comparison, in the case of maximal violation of 
CHSH inequality we have Bchsh = Ch'^^'^ with a ~ 
cos^ I which gives: 

Xu{Bchsh) ~ 0.0463. (72) 

1. Klyachko box 

Here we give the argument for calculation of X^ for 
the Klyachko box. The set of joined probability distribu- 
tions p{X) has 2^ = 32 extremal points. Again, as in case 
of xor-boxes, we apply twirling determined by group T-h 
which turns to be dihedral group consisting of d = 10 
permutations. Due to symmctrization the marginal prob- 
ability distributions calculated for the extremal points 
turns out to be context independent p{Xc) = p and to 
posses additional symmetry p{01) = p(10). As a con- 
sequence 32 extremal points, which under the action of 
group form 8 orbits (subsets of 32 extremal points 
invariant under D^) the box can be characterized by 
only two parameters, e.g. p(00), and p(ll) and conve- 
niently visualized by a 8 points on a triangle plot as it 
is shown in Fig[3j Thus a set of all symmetrized non- 
contextual distributions is a convex combination of 4 
distributions, namely p(00) = 1, i5(ll) = 1, p(00) = | 
and p(10) = p(01) = |, and finally p(ll) = i and 
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^(00) 




^(01)+^(10) ^(11) 

FIG. 3: The set of non-contextual distributions 
compatible with the Klyachko box, after twirling is 
situated within the trapezoid formed by the triangle 
without the vertex p(01) -|-p(10). The bold points 
denote orbits, i.e. subsets of 32 extremal points 
invariant under D5. 

p{10) = p{01) = |. Now our goal is to find minimum 
in X^{K) which, due to all mentioned symmetries, is 
given simply by 

X^{K) = minDigiX,J\p). (73) 
p 

Note that 5(11) — 0, so that looking for a minimum we 
can restrict to the case of i5(ll) = 0. In this way the 
problem of calculating Xu has been reduced to finding 
minimum over a single parameter p(00) in the range be- 
tween 0.2 and 1 

X^iK)^ min x(5(00),p(00)), (74) 

0.2<p(00)<l 

where 

X(a;,2/) =xlog(^) +(l-x)log(^^-— (75) 

x{x,y) is strictly increasing function of argument y pro- 
vided that < a; < y < 1 which is seen from equation 

dx{x,y) ^ y~x ^^g^ 
dy y{l~y)' 

So finally we get Xu{K) = x(5(00),0.2) w 0.0466576. 

F. Additivity results 

In this section we prove that for exemplary xor-boxes 
Xu is additive, and that it is 2-copy additive for isotropic 
xor-boxes considered in this paper. We begin with Defi- 
nition and necessary lemmas. The main results are the- 
orems [8] and [9j and their main application is stated in 
Corollary [3j 
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Definition 4 For any two hypergraphs Gi = {Vgi,Eci) 
and G2 = {Vg2, EG2), we define their tensor product to 
be the hypergraph 



Gi (g) G2 



Gl(g,G2 1 EGi(g,G; 



(77) 



where Vbi^iGs (^Gi U V^s) and -Eci^Ga := {cU c'|c G 
i?Gi and c' G £■02}- Eor two boxes Bi = {gi(Ac)} and 
B2 = {52 (Ac')} compatible with hypergrahps Gi and G2 
respectively, their tensor product is a box compatible with 
Gi G2 given by Bi (8) B2 {<?i(Ac)52(Ac')} such 
that the distribution of its context c U c' is a product of 
distributions gi{\c) and g2{\c')- 

We make now an observation, which characterizes the 
set of noncontextual boxes of the tensor product of two 
the same hypergraphs. 

Observation 3 The set of noncontextual boxes NGg»2 
belonging to C^"^ ® C^q^ is spanned by tensor products of 
extremal points of the set NGg ■ 
Proof. 

To see this, consider an extremal point of NGg<s2. It is 
equal to a box with joint distribution over 2n observables 
Sa.a.0 for some Hq. Such a distribution is a product of dis- 
tributions (5ai ,aoi and (5a2 ,ao2 where ai and a2 are output 
strings of outputs ai and each ai G {0, 1, c?/i.}. aoi 
and ao2 are some fixed output strings, ai and a2 when 
written in a system with basis d (assuming that all of 
them are equal, otherwise one has to consider a multibase 
system) and concatenating yields a. Hence any extremal 
point of A^Cg®2 is a product of extremal points of the 
set NGg -I 

We will need also a lemma stated in general for linear 
operations, which will be used for twirling operation: 

Lemma 8 After any linear operation r on any convex 
set Y, the set of extremal points of the image set t{Y) 
is the subset of the set of images of extremal points of Y 
through t. 
Proof. 

Consider any point t(B) which is an image of non- 
extremal point B in Y . Since r is linear, we have t{B) = 
T{piBi + {l-pi)B2) =Pit{Bi) + {1-pi)t{B2) hence it 
can not be extremal in t(Y). Thus, any extremal point 
in t{Y) must be an image of extremal point in Y . | 

This enables us to state the following observation: 

Observation 4 For any linear map r 
the set T (g) t{NGg»2) is spanned by tensor products of 
extremal points of the set t{NGg) 
Proof. 

By lemniajsjthe only extremal points in t (E)t{NGg»2) 
are within the set of images of extremal points through 
T. We know from observation [3] that extremal points of 
T(g)T(A^G'(3®2) are of the form T{Ei)®T{Ej) where Ek are 
extremal points of NGg- Now if T{Ei) is not extremal 
in t{NGg) i.e. can be decomposed into J^iPi'^i^kJ 
then clearly the image t (E) T{Ei (E) Ej) for any j is 



not an extremal in r ig) t{NGg»2), as it can be decom- 
posed into nontrivial mixture '^iPiT{Eki) (S T{Ej). The 
same argument holds for T(Ej) : it can not be non- 
extremal in t{NGg) if the pair T{Ei) E)T{Ej) is extremal 
in T (g) t(NGg<s2)- Hence, the only extremal points in 
t®t{NGg»2) are the tensor products of extremal points 
in T{NGG).t 

In what follows, for two arbitrary boxes Bi and B2 by 
interval [Si, ,62] we mean the set {pBi + (1 — p)B2\p E 
[0,1]}. 

Lemma 9 Let box B — {g{Xc)} G C'g*'' be invariant un- 
der some linear operation t, which maps all boxes on 
Gq^^ into interval [Be^B'^] and maps NGg ii^io interval 
[L,L'] C [i?e,i?g]. Let also some of g{Xc) be equal to 
g{Xco) and the rest of the g{Xc) be equal to Il{g{Xcg)) for 
some reversible operation H. Then, there holds: 



Xu{B®^) = inf 



^(5(Aco)5(Aco)lbi) (78) 



where g{\cf^)g{\ca) is a product of distributions g{Xco) 
and Pi is the distribution of some fixed context number 1 

ofPnc- 

Proof: Let ni be the number of the contexts of B with 
the same distribution g{\co) and n2 the number of the 
remaining contexts with distribution H((7(Aco)). In what 
follows, we identify g{\co) with q and n(g(Ac(,)) with q 
for short. We know that 

X^{B^^)= inf \Y.^{g{\M\c')\\Pcc') 

^ c.d 

(79) 

where r? is total no. of contexts and P„c — {Pcc'}- From 
the Observations [3] and |4j the box P„c can be written as 

P„c - VxEL + p2LB'^ + p^B'J. + p^B'^B',. (80) 

We switch now from equality for boxes to equality for 
contexts, using for short the notation i?e — {gc} meaning 
that Be is the context number c of a box B and similarly 
E'e = {^c} and L — {Zc}, L' — {e^}. The above equality 
gives for each c and c': 



Vc 



PilJc' + P2lce'c' + Pze'Jc' + Pie'^e'^, (81) 



Now, consider the following 4 cases, where due to 
[L,L'] C [Be, Be'] we can set L = sB^ + (1 - s)B'e for 
some s e [0, 1]. 

Case 1. Vcg{„i},c'e{ni} 

D{g{\c)9{K')\\Pcc-) = 

D{qq\\pi{sq + (1 - s)q){sq + (1 - s)q) 
+ P2{sq+ (1 - s)q)q 
+ P3q{sq + (1 - s)q) + p^qq) (82) 
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Case 2. '^ce{n2},c'e{ni} 

Di9{Xc)9{Xc')\\Pcc') = 

D{qq\\pi{sq + {1 - s)q){sq + (1 - s)q) 
+ P2{sq+ (1 - s)q)q 
+ Psqisq + - s)q) + p4qq) (83) 

Applying reversible operations IIcc' = 11 ig) J to both dis- 
tributions in the above relative entropy term, we get 

D{qq\\pi{sq + (1 - s)q)isq + (1 - s)q) 

+ P2{sq+ (1 - s)q)q 
+ Psqisq + {I - s)q) + P4qq) (84) 



which is exactly relative entropy term in ( 82 1 . Similarly, 



by considering other two cases where c € {rii}&c' e 
{712} and c € {71-2} &c' G {7^2} we get the same equality 
after applying reversible operations IIcc' = / ® 11 and 
IIcc' =11(8)11 respectively, and the assertion follows | 



Observation 5 In general lemma^holds for n- copy, i.e. 

(85) 



X„(S«")= inf D{g{\MK)--\\Pi) 

P„ceT«i...«ir(A'Cctj„) 



Proof. 

The proof goes in full analogy to that of lemma |9]| 
We can state now one of the main theorems of this 
section. 

Theorem 8 Let box B — {g{Xc)} £ C^q^ and let the 

image of Cq'^ through be the interval [Be,B'^] and 
the image of set NCq be [L,L'] C [Bg,B'^ such that 
L = sSe + (l-s)Be with s> \ and B = rBe + {l-r)Be' 
with r > s. Let also B^ — {ed and B'^ — {e'^}, such that 
Cc has disjoint support from e'^, then there holds 



Xu{B®B)^2Xu{B). 



(86) 



Proof. 



We first note that by theorem [4j with the set of au- 
tomorphisms Cq being the set of all tensor products of 
automorphisms from Cq with themselves, we have: 



inf J_Vi?(.g(Ae)5(AcO I bccO 



Now, by lemma [9| we have 



(87) 



X„(i?®^)= inf D{qq\\p,j) (88) 

P„cer«ir(ArCc»2) 

where q — ret -I- (1 — r)e'^ (also q = rcj -f (1 — r)ej ) {r > s 
by assumption) and indices i,j represent a fixed context 
of Pnc such that all distributions of P„c are transformable 
into it, by operations which at the same time transform 



all distributions of i?*^^ into qq. By theorem |4] and us- 
ing the fact that qq is invariant under swap (it can be 
achieved by local or global swap operations depending 
on the hypergraph under consideration), it is equivalent 
to the quantity: 



Xu{B'^') 



»2\ _ 



inf D{qq\\pililj 

Pl+P2+P3 = l 

P2 



+ ^ihl',+l[lj)+P3^) (89) 



Note, that we can relax the minimization and hence we 
have the following lower bound: 

Xu{B^^)> inf Diqq\\pikl, 



P2 



ike' + e^l,) + pse^e') (90) 



if we are lucky to find the solution that is non-contextual, 
then we will find solution to our initial minimization 
problem. As we will see, this will be the case. 

Using the fact that ei{ej) and e'j(e^) have disjoint 
supports, decomposing Zj — se.i + (1 — s)e[ and Ij — 
scj + (1 — s)e'j we get that: 



X^{B^')> inf [r2^p(-)jog( 
+ 2r(l-r)^pJ^, log( 

a 



2 (a) 



(jPis'^)p'tX 
r(l - r)p'fl,^ 



(p,,(l_,) + ££i)p(«), 
{r-l?p[t. 



'(pi(l-.s)2+p2(l-,s)+p3)p(f) 



(91) 



where {piTej} is the distribution of e^Cj. For r = 1 i.e. 
when q — ei which is the case for PR-box, PM-box, Mer- 
min's star and CH-box, we have that 



X„(B«^) > inf log 



1 



Pl+P2+P3 = l 



PlS^ 



(92) 



where LHS is clearly minimal for pi — \, which means 
that the closest distribution in our set is non-contextual, 
equal to lilj, hence 



Xu{B 



®2\ 



log4^ = 2X„(B) (93) 



Consider now r < 1. Here we are able to prove additivity 
for 2 copies, by using Lagrange multipliers approach. We 
need to find infimum of 



Xu{B^^) > 



inf 

Pl+P2+P3 = l 



PlS^ 



2r(l - r) log( 



r(l — r) 



+ (r-l)2log( 



pis(l - s) + ^) 
(r- 1)2 



Pl(l - S)2 -|-P2(1 - s) +P3 



)] (94) 
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We first check if the infimum is attained in the interior 
of the simplex of the boundary conditions. Using Math- 
ematica v07 we obtain, that there is only 1 solution of 
the set of Lagrange equations: 



different weights pi 



Pi 



P2 



2r{-r + s) 



P3 



(r - 



(95) 



However, we have that q is contextual, so r > s, which 
gives that p2 of the above solution is negative. Hence 
the function does not have infimum in the interior, in 
the considered region of parameters r and s. It suffice 
to consider boundaries, i.e. cases Pa = 0, P2 = and 
P2 = P3 — (other cases are excluded by the fact that 
Pi > 0). Again, using Lagrange multipliers method, we 
solve the first two cases. The first one has two solutions, 
which has p2 < if s > ^ . In case ps = we observe that 
Pi > 1, which finally proves that the only solution that 
attributes to infimum is pi = 1, which is non-contextual 
solution as in case of extremal q, that yields additivity 
i.e. X„(B®2) = 2Xu{B).t 

Theorem 9 Under assumptions of theorem^ X„ is ad- 
ditive on Be and B'^ . 
Proof. 

In the calculation of relative entropy for this case, we 



will have similar terms as in eqn. (90) but with n copies. 
By observation [5] we have: 

Xu{B®^) > inf D{qq...\\p^hh... + T) (96) 

where T is all the other possible terms of /„s & e„s with 
weights Pn- Note here that, Z„s & e„s are all some fixed 
context. Since, In = sen + (1 ~ s)e'n for all n we have, 



X„(i3«") > inf E'->il.. log( ' ''T; ) 

+ T2 (98) 

where T2 contains terms with powers of (1 — r). For 
extremal points r = 1 therefore, 

X„(i?«")> inf EPetL.log(^%^) (99) 



X„(i?«") > inf ^p(^L..log( — ) (100) 



Since ^gPeie^.. — 1 and minimum is attained at pi = 1 
which gives us desired proof 

X„(B«")>nlog-=nX„(B) (101) 



Analogously we can show additivity of on B'^ by ex- 

Q and Bq 



changing Zq to Zq and ej, to eg 



Corollary 3 For a box B e 1^1^ U /f/ U /cff(„) 
Xu{B'^^) = 2Xu{B). ForB e PAfUAf UCiJ(„)UPA/'U 
M' U Ci/[„-, Xy, is additive. 
Proof. 

To see the first statement, it suffices to check that the 
box B e Ip%j U /ff U Iqh^ ) satisfies assumptions of 
theorem [Sj The second is direct result from theorem [9]| 
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A„(B«")> inf i?(r"eie2. 



PlS 6162. 



T[) (97) 



where Ti and T[ are all the other possible terms of e„s 
and e'nS with Ti having powers (1 — r) while T[ have 
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xor-games i.e. such games for which payoff are only the 
functions of xor of the output [55) . 



E.g. PM' does not violate inequality ( 55 1 with n = 6 



All logarithms in this paper are binary. 
In usual notation, if P[a,b\x,y) is a probability table 
of a box with probability distribution of contexts g'{\c) 
(i.e. c = {x, v) ari d P{a,h\x ~ i,y ~ j) ~ g{^i,j)) then 
the inequality (12 1 reads 1 < Yl]=o,j=o ELo,6=o P'^i^J = 
a e fo) < 3. 



